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ABSTRACT 


The interoperability among distributed and autonomous sys- 
tems is the ultimate challenge facing the semantic web. Het- 
erogeneity of data representation is the main source of prob- 
lems. This paper proposes an innovative solution that com- 
bines lexical approaches and language games. The benefits 
for distributed annotation systems on the web are twofold: 
firstly, it will reduce the complexity of the semantic problem 
by moving the focus from the full-featured ontology level to 
the simpler lexicon level; secondly, it will avoid the draw- 
back of a centralized third party mediator that may become 
a single point of failure. 

The main contributions of this work are concerned with 
(1) providing a proof of concept that language games can be 
an effective solution to creating and managing a distributed 
process of agreement on a shared lexicon, (2) describing a 
fully distributed service oriented architecture for language 
games, (3) providing empirical evidence on a real world case 
study in the domain of ski mountaineering. 


Categories and Subject Descriptors 


C.2.4 [Distributed Systems]: Distributed Applications; 
H.3.5 [Online Information Services]: Web-based Ser- 
vices 


General Terms 
Design 


Keywords 


Emergent Semantics, Distributed Annotations, Language 
Games, Interoperability 


1. INTRODUCTION 


The interoperability among distributed and autonomous 
systems is the ultimate challenge facing the semantic web. 
The open issue is how to preserve the requirement of local- 
ity for representations, while at the same time enabling an 
effective interaction among autonomous peers. 

The heterogeneity of autonomous representations is the 
source of the problem. Strong effort is usually spent ar- 
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ranging well defined ontologies whilst neglecting the issues 
related to their wide spread use. 

In our work we conceive the problem of interoperability 
as the problem of supporting the emergence of a common 
lexicon among a community of peers. A shared lexicon en- 
ables a common denotation that allows different peers to 
refer to the same object using the same label. Building a 
shared lexicon does not necessarily require peers to share 
the underlying schemas designed to represent objects. 

Recent studies in language evolution have developed the 
notion of language games. A language game is a compu- 
tational model that allows convergence on a shared lexicon 
in a fully distributed framework. The key idea behind this 
model is that a shared lexicon emerges from adaptive pair- 
wise interactions between language users and continues to 
evolve and adapt through repeated interactions. 

Differently from the usual solution promoted by the se- 
mantic web where the official vocabulary or lexicon is pre- 
served by a third party, usually a consortium for standard 
preservation, in the language game model the shared lex- 
icon is fully distributed amongst all the peers. The main 
advantage is a distributed system without a single point of 
failure. 

The original contribution of this paper is concerned with a 
proof of concept that language games can be an effective so- 
lution to the interoperability problem among heterogeneous 
and autonomous systems. In particular, we address a spe- 
cific application problem that occurs in distributed annota- 
tion systems. As an example of such applications we refer to 
a real world case study in the domain of ski mountaineering. 

The paper includes a discussion of related works, a brief 
introduction to the language games, a service oriented ar- 
chitecture to enable the interoperability among distributed 
annotation systems and finally the empirical results of a 
case study based on three real world web sites devoted to 
ski mountaineering. 


2. SKI MOUNTAINEERING 


Ski mountaineering is a very exciting outdoor activity. In 
ski mountaineering, both the ascent and descent of a peak 
are made entirely on skis, using climbing skins and perhaps 
ski crampons for traction on the ascent, and then descending 
a continuous ski route back down to the base. This sport can 
be very risky. Avalanches represent an ubiquitous hazard 
that may arise from an erroneous situation assessment. 

To prevent or reduce the avalanches hazard, it is a com- 
mon practice for ski mountaineers to share their experiences 
on the web. The typical behaviour of ski mountaineers is to 


collect, the day before a ski trip, all the annotations from 
on-line diaries on the ski routes of interest. When the ski 
route is accomplished, the ski mountaineers note in their 
diaries the up-to-date conditions of the route. 

For ski mountaineering in the Alps there are many web 
sites, among them skirando', gulliver?, moleskiing?. These 
are all organized along the same pattern: a catalog of ski 
routes and a collection of individual diaries of ski trips where 
a diary entry describes a ski trip that refers to a ski route in 
the catalog. In such a way, given a ski route, it is straight- 
forward to retrieve the most recent annotations, i.e. the 
related diary entries. 

The scenario above is much more effective as many ski 
trips reports are collected. However, although it may ap- 
pear counterintuitive, the proliferation of web sites devoted 
to ski mountaineering doesn’t necessarily increase the ac- 
cessibility of ski trip diaries. This is because a side effect 
of the increasing number of web sites is the partition of the 
ski mountaineers into smaller communities that refer to het- 
erogenous ski route catalogs. 


3. DISTRIBUTED ANNOTATIONS 


The example above is a particular instance of a more gen- 
eral scenario that can be referred to as distributed anno- 
tations. The habit of allowing end users to provide their 
opinions of a given item as annotations, e.g., about books, 
movies, hardware, is ubiquitous. 

Blog oriented architectures for annotations introduce the 
notion of aggregation services, third party servers in charge 
of indexing all the annotations with respect to a given cat- 
egory of items. Such services allow a user to obtain all the 
recent blog annotations about a given item on demand. 

Of course the tacit assumption of aggregation services is 
that the annotated items have to be indexed using a common 
reference system. While for books ISBN provides a straight- 
forward solution, more often an agreement on a common 
reference system does not exist. 

Catalogs are usually distributed and autonomously de- 
signed. Even though they are concerned with the same cat- 
egory of items the heterogeneity of representations is perva- 
sive. For this reason aggregation services need to be paired 
with alignment services that enable the mapping among dif- 
ferent catalogs. 

Catalogs alignment can be conceived as a typical problem 
of interoperability that precludes having a fully distributed 
annotation system. In the remaining part of the paper we 
will refer to the domain of ski mountaineering but it is worth- 
while to note that the pattern of solution is independent 
from the specific application scenario. 


4. RELATED WORKS 


The problem of catalog alignment has been approached 
by many initiatives in the context of the Semantic Web. 
The usual strategy of these efforts consists in establishing 
a relationship between the local representations and a com- 
mon reference encoding, namely a shared ontology. This ap- 
proach requires two steps: (1) the definition of an ontology 
for the specific domain, (2) the definition of a mapping be- 
tween a local representation and the shared ontology. While 


‘http: //www.skirando.ch/ 
2http: //www.gulliver.it / 
http: //www.moleskiing.it 


208 


intuitive, this approach is often not effective in practice. In 
fact, the first step raises the question of who is in charge of 
managing the shared ontology and also, the mapping step is 
generally far from being trivial and too often requires man- 
ual intervention. 

Designed around this general schema, a number of ini- 
tiatives have arisen, fostered by the availability of machine 
processable semantics expressed in meta-models such as [14, 
8]. One of the most recent examples is the European project 
Harmonise [9, 10]. In Harmonise, a mediator is in charge of 
managing a shared representation of tourism concepts, while 
subscribers have to map their local encodings with respect 
to the predefined ontology. To confirm the drawbacks of 
this approach, it is interesting to note that Harmonise, af- 
ter the conclusion of the project, is currently dealing with 
the problem of establishing a consortium and the related 
sustainability plan. Furthermore, the mapping task is at 
present performed manually. 

Recently research is focused on the issues specifically re- 
lated to the mapping between schemas [4, 11, 12, 13] or shal- 
low representations like taxonomies [7, 22]. The automation 
of the mapping process would enable pervasive interoper- 
ability without the constraint of a mediator [6], that in dis- 
tributed applications becomes the “single point of failure”. 
In such a scenario, each peer would manage autonomously 
the mapping with respect to the other peers. Unfortunately, 
automatic schema matching has proven to be a complex 
problem and a general solution is still not available. A fur- 
ther problem lies in the fact that this approach does not 
scale well when applied to a fully distributed environment. 
In fact, while in centralized approaches like Harmonise only 
one lexicon is needed to map the local reference denotation 
with respect to a global one, a strategy based on pairwise 
mappings has a quadratic complexity with respect to the 
number of peers. 

Gossiping algorithms [1] have been proposed to reduce the 
scaling problem. In this approach a (partial) solution to the 
schema mapping is provided only for a small portion of the 
complete peers set. The undefined mappings are derived 
through a transitive exploration of a peer’s neighborhood. 
Although this strategy is promising, the drawback of pro- 
viding manual mappings prevents a full automation of the 
process of interoperability. 

Alternative solutions have been proposed that aim to re- 
duce the complexity of the mapping problem by moving from 
the schema to the object level [21]. Object mapping differs 
from schema mapping. In object mapping the assessment 
is performed looking at attribute values of the objects. Of 
course, such an approach is less general than schema map- 
ping. A good schema mapping makes possible to derive all 
the correct mappings for all the objects. On the contrary, 
the performance of an algorithm working at the object level 
is affected by the specific attribute value distribution. 

A final research trend that is receiving increasing interest 
looks at the problem of interoperability in terms of a shared 
lexicon. A W3C initiative, SKOS [16], has been organized to 
explicitly manage a shared vocabulary for concepts denota- 
tion. Similarly to Harmonise, SKOS relies on the restrictive 
hypothesis that autonomous peers will subscribe to such a 
common vocabulary. 

The fundamental question, then, is how to exploit the 
benefits of a single lexicon for each peer without incurring 
the restrictions of a centralized global reference that is con- 
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Figure 1: The pairwise roles 


trolled by a third party. We give a solution to this ques- 
tion by leveraging the approach of language games. With 
language games [19, 17], the language (or the ontology, the 
lexicon) of a community of peers is considered a system that 
emerges from adaptive interactions among peers. Thus, in 
contrast with the techniques we have described before, each 
peer starts with a preliminary hypothesis of lexicon and the 
challenge is to enable a process of interactions that brings 
the peers’ lexica to converge on a single common denotation 
system. The language games approach has been validated by 
simulations in different domains, e.g., bookmark taxonomies 
[2], robot communications [18]. 

In the following we will show how language games can 
be exploited to deal with the problem of supporting the 
emergence of a distributed common lexicon without the con- 
straint of a centralized third party mediator. We first in- 
troduce the basic notions of language games with reference 
to the ski mountaineering domain and then we describe a 
service oriented architecture developed to enable the inter- 
operability among the ski mountaineering web sites of the 
Alps. 


5. LANGUAGE GAMES 


There are many variations of language games. In the fol- 
lowing we will focus our attention on a specific model known 
as naming games [20]. A naming games is defined by a set 
of peers P (the game players), a set of objects O (the het- 
erogeneous representations of the ski routes), and a set of 
labels £ (the candidate names to denote the ski routes). A 
peer p € P is then defined as a pair p =< Lp, Op >. 

Each peer p € P has its own lexicon drawn from the 
Cartesian product Lp = Op x Lp XN x N, where Op are the 
objects referenced by p, £p is the local vocabulary of p, and 
N are the natural numbers used to represent the strength of 
the association between Op and £p. The lexicon may include 
synonymous labels, where two labels are associated to the 
same objects, and homonymous labels, where the same label 
is associated with two different objects. 

The following table illustrates a sample lexicon. From the 
table, for example, we can see that the association between 
object oi and label lı has been successfully used 8 times in 
10 different language games, while the association between 
0; and label l2 has been successful only once in 8 games. 

The ultimate goal of the game is to bring the local lexica 
of the peers towards the same association structure. If all 
the peers converge to the same label to denote the same 
object the lexica will enable effective communication among 
peers. 
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A naming game involves an iterative process based on 
pairwise sessions. The basic interaction involves two peers 
with different roles: speaker and hearer, and so a session 
of communication is not symmetric. Nevertheless each peer 
can play different roles in different sessions. 

The interaction proceeds as follows (see Figure 1). First 
the speaker ps selects an object os € Os from its set of 
objects, and encodes os using a label lj. The label is chosen 
according to the preferences expressed in the current version 
of the local lexicon £s (local to speaker ps). The encoding of 
object os is obtained by looking at the most successful label. 
A label l; is more successful than a label lx iff (os, lj, uj, aj) € 
Ls, (0s, lk, Ur, Gr) € Ls, Uj > up and either aj/uj > ax/ur 
or aj/uj = ax/up and uj > ux, where uj represents how 
many times the label J; has been used and aj represents 
how many times there was an agreement on label l; with 
other peers. In case of a tie, a random choice is performed. 

The hearer pa decodes the label J; and retrieves the as- 
sociated object, on € On, by looking at its own lexicon Lp. 
The actuation step is in charge of sending the object op to 
the speaker ps. 

The last step is concerned with assessment. The speaker 
has to verify that the object received from the hearer is the 
same as that selected at the beginning of the communication 
session. 

If the object referred to by the hearer is the same as se- 
lected by the speaker, both of them positively reinforce their 
lexica by updating the corresponding label-object associa- 
tion as follows: (os,lj,uj +1,a; +1) € Ls and (on, 1j,uj + 
1,a; +1) € Ly. If the hearer replies with a different object 
Os Æ On, it means that the communication failed and the 
peers’ lexica is negatively reinforced by only increasing the 
counters of lexical relation (while the counters of agreements 
on the lexical relation remain the same): (0s, lj, uj+1,aj) € 
Ls and (0n,1j;,uj +1,a;) E€ Ln. 

The critical point of the game is the assessment step. Ob- 
jects can refer to heterogeneous representations of the same 
concept and therefore the assessment step needs to be care- 
fully implemented. Looking at our example, objects can be 
different instances of different schemas that refer to the same 
ski route. An assessment strategy could exploit the mapping 
between the two schemas, but as we have seen before this 


task is too complex. An alternative strategy is to assess the 
equivalence by looking directly at the data. In Section 7 we 
will provide the details of the implementation choice for our 
case study on ski mountaineering. 


6. SYSTEM ARCHITECTURE 


In the previous sections we have defined the problem space 
that we are exploring and the language games-based ap- 
proach we leverage. In this section, we will illustrate the 
software component that we have designed and implemented 
to concretely realize our technique. We defer to the next 
section the discussion of how this component is used in a 
specific scenario, namely the domain of ski mountaineering. 


6.1 Requirements 


The reference scenario that drove the design task is similar 
to the one presented in section 2. Precisely, we assume that 
our solution is to complement a network of web applications 
each providing a legacy catalog system and an annotation 
service. The catalog system is a repository of representa- 
tions of items (objects in the terminology of section 5), e.g., 
books, ski routes, bookmarks. The annotation service col- 
lects users’ reviews on cataloged objects. While the systems 
we are targeting appear to be fairly distinct, they are in- 
stances of a more general type of application that has a 
private catalog of objects and can share references to these 
objects. Such systems are indeed very common on today’s 
Web, with Amazon and Epinions being typical examples. 

The main functional goals of our component are to allow 
the alignment of catalogs and to support the distributed 
aggregation of annotations created by related applications. 
The basic requirements that we identified consist in preserv- 
ing the heterogeneity, the autonomy and the robustness to 
evolution of these applications. To support heterogeneity, 
we require only the minimal set of architectural constraints 
and allow for alternative implementations of various parts of 
our component. We guarantee autonomy by avoiding strong 
or centralized coordination among individual systems. Fi- 
nally, we explicitly take into account that the system is in- 
herently dynamic and subject to change. 

In order to employ the language games technique in this 
scenario, we needed to realize and integrate a distributed 
implementation of the language games model. Some imple- 
mentations of language games already exist, e.g., McIntyre 
in [15] describes a testbed to set up, control and visualize 
language games simulations. However, to our knowledge, all 
the available implementations consist of stand-alone simula- 
tors that lack the distributed nature of the model and do not 
allow it to be deployed in a real world setting. We further 
require that the implementation of the language games be 
flexible, to easily implement possible variants to the model, 
and easy to program and test. 

Lastly, we wanted to minimize the cost of augmenting 
existing solutions with our component. This requirement 
translates into realizing a component that can be transpar- 
ently plugged into an existing system, requiring as few mod- 
ifications as possible to its legacy parts. 


6.2 Implementation 


Figure 2 represents the architecture of our component as 
it was shaped by the requirements described above. The Ap- 
plication Server represents the generic legacy systems we are 
focusing on; DiAGRA (Distributed AGgRegator of Annota- 
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Figure 2: The architecture of the component. 


tions) is the module in charge of implementing the aggrega- 
tion functionalities; DiCA (Distributed Catalog Alignment) 
realizes the catalogs alignment feature. 


6.3  DiAGRA 


The DiAGRA module decouples inter-peer communica- 
tions from the details of how each peer’s catalog is orga- 
nized. 

To do so, it maintains the current public lexicon, a dic- 
tionary that maps local objects to labels. All the commu- 
nication is proxied through DiAGRA which performs this 
mapping task. DiAGRA uses this lexicon to enhance the 
aggregation primitives of trackbacking and crawling. Track- 
backing refers to the broadcasting of new local annotations 
to all peers in the federation: it is used to “push” annota- 
tions to remote sites. Conversely, the crawling facility allows 
a peer to “pull” annotations from remote peers. Figure 3 
describes the flow of operations performed when a new an- 
notation is inserted by a local user. Figure 4 represents how 
remote annotations are retrieved. Note how the translation 
operations performed by DiAGRA are transparent to the 
Application Server. 


DiAG RA (local) DiAGRA (remote) AS (remote) 


1: trackback(url, 1k1) 


1.1: gk:=getGlobalKey(Ik1) 
yee | 
ye 


1.2: trackback(url, gk) 
f 1.2.1: 1k2:=getLocalKey(gk) 
+ 

+ 


1.2.2: trackback(url, 1k2) 
bemennem bana irme AE A 


Figure 3: Sequence diagram for the trackback oper- 
ation. 


In the trackback scenario, when a user posts a new com- 
ment with respect to a local item, the Application Server 
notifies DiAGRA of the new annotation available. DiAGRA 
looks up its lexicon and retrieves the label associated with 
the annotated object. It substitutes this label for all the ref- 
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He 
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B 
1.3: replaceKeys(gk_list, lkl _list) 
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Figure 4: Sequence diagram for the crawling opera- 
tion. 


erences to the local item in the annotation and then sends 
the annotation to the other peers in the federation. A peer 
receiving the annotation performs the opposite translation, 
from the label to the local item. The translated annotation, 
now containing only local references, is made available to 
local users as if it was produced locally. 

In the crawling scenario, a peer asks remote peers to pro- 
vide the most recent annotations inserted by their users. 
At the remote peer, the crawling request is handled by the 
DiAGRA component. It fetches the annotations from the 
Annotation Server, translates references to local objects to 
the associated labels defined in the lexicon, and sends back 
the translated annotations. The first peer is now able to 
perform the opposite translation and presents the received 
annotations to local users. 

The translation processes that we have described may fail 
in two cases: when there is no mapping available for an 
object - meaning that it is not shared by other peers of 
the federation - or when a received label has no mapping - 
that is, it is encoding an unknown object. In both cases, 
DiAGRA simply drops the faulty annotations. 

In summary, DiAGRA makes possible to dynamically pa- 
rametrize communication with a desired lexicon. Therefore, 
it solves the problem of how an Application Server, given a 
lexicon common to a community of peers, can start using the 
lexicon in its communications without modifying the legacy 
catalog. Next we will see how such a common lexicon can be 
provided. This is the task of the DiCA module, described 
in the following section. 


6.4 DiCA 


The DiCA module encapsulates the language games tech- 
nique by implementing the model outlined in section 5. 

DiCA’s primary task is to use the language games ap- 
proach to adaptively build and refine a common lexicon. 
Specifically, its responsibilities are threefold. Firstly, it de- 
fines the choreography of the distributed system, by speci- 
fying the possible inter-peer communication methods. This 
is achieved by providing primitives to send a peer a label, 
a label and a set of objects, a set of objects or a feedback 
message. Secondly, it stores a variety of implementations 
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Figure 5: A language game. 


for each step of a language game. For instance, it contains 
different primitives to select the next peer to play with, such 
as randomly selected or in a round robin fashion. Finally, it 
provides the mechanisms to create and run game strategies, 
i.e., to express how the communication and game primitives 
should be combined to play a language game. 

Figure 5 shows two DiCA components engaged in a lan- 
guage game interaction. In this case, the choreography con- 
sists of the exchange of three messages: the speaker sends 
the hearer a label; the hearer sends an object to the speaker; 
the speaker sends the result of the game to the hearer. Each 
self-message in the diagram represents the invocation of a 
primitive operation. By selecting different primitives one 
can define new strategies. For example, a strategy where 
peers are contacted in a round robin fashion can be obtained 
by substituting the invocation of the PickPeerRandom prim- 
itive with the PickPeerRoundRobin. 

A complete description of the DiCA module is out of scope 
of this paper. We invite the interested reader to refer to [3] 
for further details. 

The decomposition of the system into DiAGRA and DiCA 
allows us to keep clearly separated the tasks of building 
a common lexicon, which is performed by DiCA, and the 
task of using the lexicon, performed by DiAGRA, to pro- 
vide interoperable services. The main functional effect of 
this separation is that the stable lexicon is less sensitive to 
fluctuations in the process of agreeing on a common lexicon. 

From the technological point of view, the interfaces be- 
tween the Annotation Server, DiAGRA and DiCA are REST 
interfaces. Communication between peers is performed us- 
ing web services technology. 


7. AREAL WORLD APPLICATION 


The architecture we have presented in the previous section 
is general enough to fit many different scenarios. However, 
in order to practically assess it, we grounded the architecture 
in the ski routing scenario presented in section 2. 

The web applications we are targeting are gulliver, ski- 
rando and moleskiing. Each of these web sites is the center 
of a ski mountaineering community to which it offers the 
services of a ski route catalog and a ski trip annotation list. 

This scenario constitutes a fine example of the seman- 
tic interoperability problem: because these communities are 
completely autonomous and heterogeneous, they use differ- 
ent schemas to describe ski routes and denote the same 
routes using different names. As a consequence, annota- 
tions on trips performed along the same routes cannot be 
shared among communities. Furthermore, in the ski moun- 
taineering domain there is currently no effort leading to the 
formation of a shared ontology nor is it foreseeable in the 
future. Thus, it represents an ideal scenario for the applica- 
tion of the language games approach. 

Let us, then, examine how the advertising model maps 
to the ski mountaineering domain. Ski mountaineering web 
sites play the role of peers and ski routes map to objects. 
They are private to each peer, in the sense that a peer is free 
to model a route according to the schema it prefers. The role 
of objects is played by concrete representations of ski route 
models. A convenient way to represent a ski route is to 
provide an XML linearization of the information available 
for the route. The following are the linearizations for the 
same route as modeled by two different ski mountaineering 
web sites: 


<route> 
<id_route>3002</id_route> 
<top>Altissimo di Nago</top> 
<route>da S.Giacomo</route> 
<area>Trentino</area> 
<municipality>MORI</municipality> 
<valley> 
Valle dell’Adige (Alto Garda - Baldo) 
</valley> 
<difficulty>MS</difficulty> 
<exposure>SE</exposure> 
<start_height>1150</start_height> 
<top_height>2078</top_height> 
<gap>930</gap> 
<start_point>s. Giacomo</start_point> 
</route> 


<route> 
<id>3940</id> 
<top>Altissimo di Nago</top> 
<region>Adamello</region> 
<title>Da San Giacomo</title> 
<global_difficulty>AD</global_difficulty> 
<ski_difficulty>S4</ski_difficulty> 
<base_height>1194</base_height> 
<top_height>2079</top_height> 
<gap>900</gap> 
<exposure>E</exposure> 

</route> 


A typical use case scenario is the following. The DiCA 
module on moleskiing is activated and starts playing lan- 
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guage games with gulliver and skirando. When the shared 
lexicon starts emerging, a snapshot of the lexicon is provided 
to DiAGRA. At this point, DiAGRA holds a mapping for 
some of the routes that are common to other web sites. For 
these route, moleskiing can retrieve and present to its users 
annotations inserted on gulliver or skirando, and conversely 
share annotations produced locally. 

The current prototype uses Tomcat and the axis toolkit to 
support the SOAP protocol, and Java for the DiAGRA and 
DiCA implementations. The language used to define game 
strategies is BPEL4WS, a composition language normally 
used to perform web service orchestration. We chose it for 
its built-in coordination features, relatively high level of ab- 
straction and the availability of tools to do quick, graphical 
programming. 


8. EXPERIMENTAL RESULTS 


We are currently testing the system locally before deploy- 
ing it on the real web sites. A number of factors influence 
the outcome of a game. Previous works on language game 
simulations show that the number of peers, the cardinality 
of the object and label sets of each peer, the mutual overlap- 
ping of the object sets and the particular strategy adopted 
to play a game are critical factors. 

Our test bed has the following setup. It reproduces the 


federation composed of the skirando, gulliver and moleski- 
ing web sites. The following table summarizes the main 
characteristics of the datasets we are using: 
gulliver | moleskiing | skirando 
Total items 38 179 69 
gulliver overlap - | 22 (12%) | 8 (11%) 
moleskiing overlap | 22 (57%) - | 51 (73%) 
skirando overlap 8 (21%) | 51 (28%) - 
Complete overlap 6 (15%) 6 (3%) 6 (8%) 


The overlapping objects in the test datasets were found man- 
ually. 

In regards to the details of peers interactions, we are ex- 
perimenting with a number of basic language game strate- 
gies. An especially critical step in the game is the assessment 
task, in charge of evaluating whether two route lineariza- 
tions represent the same ski route. In the current setup, 
the DiCA primitive that performs this step uses a string 
comparison technique based on a bipartite matching algo- 
rithm. It works as follows: The linearizations are divided in 
tokens and schema information is dropped. This leaves two 
sets of tokens, in our example {3002, Altissimo di Nago, da 
S.Giacomo, Trentino, MORI, Valle dell’Adige (Alto Garda - 
Baldo), MS, SE, 1150, 2078, 930, S. Giacomo} and {3940, 
Altissimo di Nago, Adamello, Da San Giacomo, AD, S4, 
1194, 2079, 900, E}. A bipartite graph matching algorithm 
is then used, given a distance function, to find the opti- 
mal matching of tokens. Working on attribute values over- 
comes the drawback of high variance in schema design by 
taking advantage of the redundancy of the data values. The 
fundamental assumption underlying this assessment method 
is that different representations of the same object share a 
significant part of their textual content and that, on the 
contrary, the contents of different objects are significantly 
different. For example, in our sample routes, the tokens 
“Altissimo di Nago” and “Giacomo” are present in both 
representations. Of course, this method is not applicable 


where the assumption does not hold. It is important to 
stress, however, that the assessment primitive, like all the 
other game primitives, is a parameter of a peer’s strategy. In 
other words, peers are free to choose the assessment method 
that best suits the characteristics of objects representation. 

The primitive then, given an object, ranks local routes 
according to the matching weight. On our data, this simple 
method effectively detects the correct matching, if one ex- 
ists. If a matching does not exist, it still provides an answer, 
which can be interpreted as the best approximation for the 
given example route, i.e., the “nearest” ski route, according 
to some distance metrics. 

We tested the system running various game sessions. In 
every game, either two or three peers were employed, playing 
the roles of gulliver, skirando and moleskiing. Every peer 
contained the routes in common with all the other peers 
participating in the game. Dynamic modifications to the 
federation or to individual peers’ object sets will be included 
in future trials. 

Figure 6 shows the plot of four sample game sessions. 
It shows the percentage of lexica convergence as a func- 
tion of the number of games played by peers. 0% conver- 
gence means that there is no common associations among 
the peers: every peer is using different labels to encode the 
same object. Thus, the common lexicon is empty and inter- 
peer communication will fail. Conversely, 100% convergence 
indicates that all peers have reached an agreement on how 
to reference all shared objects. The common lexicon con- 
tains one entry for each shared object and thus inter-peer 
communication is always successful. Translating these find- 
ings into our application, 0% convergence implies that no ski 
routes contained on remote sites have been associated to lo- 
cal routes. Therefore, no annotation produced on a remote 
site is available on the local site. 100% convergence means 
that all remote routes have a local correspondent and that 
all annotations produced in the federations are available to 
all web sites, independently of the production site. 

Some observations are possible about these results. First- 
ly, all games end in a 100% convergence state. This hap- 
pens because there is a complete overlapping of the object 
sets of all playing peers. In a more realistic situation there 
would be peers equipped with objects that are not common 
to other peers. In this case, the convergence process would 
stop before reaching 100%. 

Secondly, the increase in the convergence rate is not strict- 
ly monotonic. This can be explained as follows: As a result 
of a series of games among a subset of all peers, an associ- 
ation between an object and a label might be chosen that 
maximizes communicative success in this subset. However, 
this association might represent a suboptimal choice at the 
level of the whole federation. Hence the decrease in the 
overall convergence percentage. 

Manual checking has shown that the routes that end up 
being denoted by the same label in the shared lexicon are 
actually the same. However, this result depends essentially 
on the efficacy of the assessment module: in this case, it was 
able to correctly identify matching routes. 

In tests, the convergence speed is most strongly dependent 
on the object sets cardinality. The smaller the cardinality, 
the lower the number of matchings to be established and, 
consequently, the faster the convergence. With regard to the 
peer numbers, it is worth remarking that in the test setup, 
we deal with a very limited federation, composed of either 
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Figure 6: Evolution of common lexicon formation. 


two or three peers. Therefore, it is impossible to mean- 
ingfully assess the impact of this factor on the lexicon for- 
mation. Nonetheless, on the basis of previous simulations, 
we expect the peers number to also be a key factor in the 
convergence speed. 


9. CONCLUSIONS 


In this paper we investigated the problem of building the 
distributed common reference systems needed to enrich cur- 
rent web applications and allow for their meaningful inter- 
operability. We introduced a novel approach to this problem 
based on the language games technique. To our knowledge, 
this paper is the first to describe a general architecture that 
can be used to deploy the technique in real applications on 
the web. Lastly, we presented our experience with a con- 
crete example of both the technique and the architecture in 
the field of ski mountaineering. 

There is wide scope for future work. The model underly- 
ing the language games technique is still fairly unsophisti- 
cated and we plan to use the experience gained from practi- 
cal experimentation to improve it. Along the same line, we 
expect to design more refined strategies to guide the games, 
in order to improve the lexicon building process. The use 
case we have shown here has some limitations. In particu- 
lar, there is a one-to-one mapping between objects and their 
linearization. We plan to test our approach on other, more 
complete, domains, e.g., the blogosphere. 

Finally, it is interesting to observe that the problem we 
considered in this paper is only one small subpart of the 
larger problem of providing an extension of the current web 
so that “information is given well-defined meaning, better 
enabling computers and people to work in cooperation” [5]. 
It is our hope that the approach we present here, although 
orthogonal to the ones employed by the mainstream Seman- 
tic Web initiative, can be a useful piece in the solution to 
this larger problem. We are encouraged by the results we 
present in this paper, since they show that this research 
trend, even at its beginning, can successfully be applied to 
real problems in real world scenarios. 
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